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BitCurator Access 

• Two-year project (October 1, 2014 - September 30, 2016) at 
School of Information and Library Science, University of 
North Carolina at Chapel Hill 

• Funded by Andrew W. Mellon Foundation 

• Developing open-source software to support access to disk 
images. Three core areas of focus: 

- Tools and reusable libraries to support web access 
services for disk images 

- Analyzing contents of file systems and associated 
metadata 

- Redacting complex born-digital objects (disk 
images) and emulated access to redacted images 
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Redacting Born-Digital Materials 



• Locating relevant items can be problematic 

- compressed files 

- proprietary formats 

- formatting variations 

- Encryption 

• ■ • 

• Digital forensics tools can help 

- Scan file systems and disk images block-by-block 

- Extract features when possible 

- Report on materials that resist analysis 



Examples of Potentially Private and 

Sensitive Information 

• Personal identifiers (e.g. SSNs, DOBs, Drivers 
License #s, corp. and govt. IDs) 

• Financial information (e.g. credit card numbers) 

• Geolocation data 

• Email messages, email addresses, attachments 

• Traces of online activity (e.g. search histories, 
web caches, domain names, IP addresses) 

• Recoverable data from deleted files 

• Partially overwritten data 



We know how to look for these things 



Bulk Extractor Viewer 

\ File Edit View Boo 
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& Run bulk_extractor 

Required Parameters 

Scan: ® Image File O Raw Device O Directory oF Files 



Output Feature Directory 

General Options 

□ Use Banner File 

□ Use Alert List File 

□ Use Stop List File 

□ Use Find Regex Text File 

□ Use Find Regex Text 

□ Use Random Sampling 



Tuning Parameters 




□ Use Context Window Size 




□ Use Page Size 


□ Use Margin Size 


□ Use Block Size 




□ Use Number oFThreads 




D Use Maximum Recursion Depth 




□ Use Wait Time 





Parallelizing 

D Use start processing at oFFset 



See: http://www.forensicswiki.org/wiki/Bulk extractor 
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Developed by Simson Garfinkel 



...and we know sensitive information 
turns up in all sorts of interesting places: 



OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 
OUTLOOK 



-36 7 





DOB 
55N 
DOB 
55N 
DOB 
5SN 
DOB 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 
SSN 



-1 I IB. 1 I— 1*1 i 



-2 




1 



l\xl 

:0A! 
i\xl 
;0A! 
i\x( 
;0A! 
i\xl 
! 

II 



II 



SSNs and DOBs identified in large PST 
backup collection using bulk_extractor 



Encryption may be a marker for 

sensitivity 




Attention: Monitoring & Audit 

200 East Gaines Street 
Tallahassee, FL 32399^226 



EMPLOYEE NAME 



with the division 



FOR CARRIER'S DATE STAMP 



I SOCIAL SECURITY NUMBER 



EMPLOYER NAME 




SENT TO DIVISION 



I DATE OF ACCIDENT 



If you do not agree with the a 
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PGBA.I.LC 

TRICARE CLAIMS ADMINISTRATOR 

P.O. BOX 70J2 
CAMDEN, SC 29020-7032 



:2*\ (XKXXXXXXX) 
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TR1CARE EXPLANATION OF BENEFITS 

This is a statement of the action taken on your TRICARE claim. 
Keep this notice for your records. 



X HUMANA. 

f % Military Healthcare Services 



Date of Notice: 
Sponsor SSN: 
Sponsor Name: 
Beneficiary Name: 
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Benefits were payable to: 



Claim Numbe r: 



Services Provided By/ 
Date of Services 



Services 
Provided 



Amount 
Billed 



TRICARE 
Approved 



See 
Remarks 



MEDICAL SUPPLIES 



I II I 



Other Pll and 
sensitive data 
may be harde 
to find, 
requiring e.g.: 

• OCR 

• named entity 
recognition 

• partial file 
reconstruction 

• format-specific tools 

• visual inspection 



Example of EXIF Metadata from a JPEG File (Generated Using exiftool*) 



- — ExifTool - — 

ExifTool Version Number : 9.38 
— - System 

File Name : IMG_20130823_151811.jpg 

Directory : C:/Users/callee/Documents/images/digital- 

forensics-lab 

File Size : 1785 kB 

File Modification Date/Time : 2013:08:23 16:36:44-04:00 
File Access Date/Time : 2013:10:14 17:13:02-04:00 
File Creation Date/Time : 2013:08:23 16:36:44-04:00 
File Permissions : rw-rw-rw- 

— File — 
File Type 
MIME Type 
Exif Byte Order 
Image Width 
Image Height 
Encoding Process 
Bits Per Sample 
Col or Compononts 



: JPEG 

: image/jpeg 

: Big-endian (Motorola, MM) 

:2592 

: 1944 

: Baseline DCT, Huffman coding 
:8 

=-3 



Cb Cr Sub Sampling 
GPS 

GPS Img Direction 
GPS Img Direction Ref 
GPS Latitude Ref 
GPS Latitude : 
GPS Longitude Ref 
GPS Longitude 
GPS Altitude Ref 
GPS Altitude : 
GPS Time Stamp 
GPS Processing Method 
GPS Date Stamp 
IFD0 — 
ntation 



: YCbCr4:2:0 (2 2) 
:83 

: Magnetic North 
: North 

35 deg 55' 2.24" 

: West 
: 79 deg 2' 57.55" 
: Above Sea Level 
0 m 

: 19:18:06 

: NETWORK 

: 2013:08:23 



: Unknown (0) 



Camera Model Name 
Modify Date 

Y Cb Cr Positioning 

Y Resolution 
Resolution Unit 
X Resolution 

Make : 
— ExiflFD — 
Create Date 
Date/Time Original 
Exif Version 
Flash Energy 
Image Unique ID 
Exposure Time 



: Galaxy Nexus 
: 2013:08:23 15:18:11 

: Centered 
:72 

: inches 
:72 
Samsung 

: 2013:08:23 15:18:11 

: 2013:08:23 15:18:11 
:0220 
:0 

: OAEL01 
: 1/17 



Geolocation data 
embedded in EXIF 

metadata from a 
smartphone photo 



ISO : 125, 0, 0 

*http://www.sno.phy.queensu.ca/' v 'phil/exiftool/ (Also available through the BitCurator environment) 



Automated Redaction and Access Options 



Option A: Redact 
from live image in 
EaaS via copy-on- 
write overlay 




| Redact items for ] 
I session via overlay J 



Create copy-on- 
write overlay 



f Source media 



Option B: EaaS 
access to 
previously redacted 
image 




Create copy-on- 
write overlay 



Redacted image copy 
(raw or forensic 
repackage 



Option C; Browse 
non-live file system 
with redaction 
mask 



bca webtools 



Redaction script \<- (annotated DFXML) 
1 $ f 

s * 



List of items 
to be redacted 




Forensic disk image 



Analyze with 
bulkextractor 
and fiwalk 



EaaS = Emulation-as-a-Service. http://bw-fla.uni-freiburg.de/ 



Automated Redaction and Access Options 



Option A: Redact 
from live image in 
EaaS via copy-on- 
write overlay 




| Redact items for ] 
I session via overlay 1 

s * ^ 



Create copy-on- 
write overlay 



Option B: EaaS 
access to 
previously redacted 
image 




Create copy-on- 
write overlay 



Redacted image copy 
(raw or forensic 
repackage 



Option C; Browse 
non-live file system 
with redaction 
mask 



> 








bcawebtools 



Redaction script \<- (annotated DFXML) 
1 $ f 

s * 



List of items 
to be redacted 




Forensic disk image 



Analyze with 
bulkextractor 
and fiwalk 



EaaS = Emulation-as-a-Service. http://bw-fla.uni-freiburg.de/ 



BCA (BitCurator Access) Webtools 

Prototype to demonstrate integrating digital forensics forensics software 
libraries and lightweight webservices tools 

Drop disk images in a local or network-accessible location, start up the 
service, and start browsing. 



Most analysis runs server- 
side (via Sleuthkit and 
DFXML Python bindings, 
among others) 

Service is database-agnostic 
(we're using postgres) 

Automatic metadata 
production - Digital 
Forensics XML (DFXML), 
PREMIS, others) 



lighttpd (or Apache) 



Flask web application 



A 



Jinja 



SQLAIchemy 



A 
V 



pytsk3 



Templates 



The Sleuth Kit (TSK) 



Database Server 



PostgreSQL 



Disk image metadata DB 



Internet or Intranet 



Client 



Web browser 



Local / network storage 
for downloaded items 



https://github.com/kamwoods/bca-webtools 



Sunitha Misra, Christopher A. Lee, and Kam Woods, "A Web Service for File-Level Access to 
Disk Images," Code4Lib Journal 25 (2014), http://journal.code4lib.org/articles/9773 




bca-webtools 



The bca-webtools software provides access to forensically-packaged and raw disk images. Supported file systems 
include FAT16, FAT32, NTFS, HFS+, ISO9660, and EXT2/3/4. Click on Browse' to navigate through the file system(s) 
within the disk image, or 'Download' to download the complete disk image. 



Image Name Image Info 

charlie-work-usb-2009-12-11 .E01 jQ 




Download 





bca-webtools 



Browse directories and download files: 



d/r Filename 

r SAttrDef 

r SBadClus 

r SBitma p 

r SBoot 

d SE xtend 

r SLogFile 

r $MFT 

r SMFTMirr 

r $Secure 

r SUpCase 

r $Volume 



01 .zip 

astronaut.jpg 
astronautl .jpg 
Email 

Immortality 
invsecr2.exe 
microscope.j p g 
microscopel .jpg 
Nitroba work.odt 
SQr phanFiles 
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2009-1 1-24T21:21:16Z 
2009-1 1-24T21:33:33Z 
2009-1 1-24T21:43:42Z 
2009-1 2-1 0T22:27:55Z 
2009-1 1-24T21:55:45Z 
2009-1 1-19T18:42:25Z 
2009-1 1-24T21:27:51Z 
2009-1 1-24T22:1 9:21 Z 
2009-1 1-19T21:26:42Z 
1970-01 -01 T00:00:00Z 





GtLTLSfopber A- Lh^ Km Woods, Matthew Kirs c be lib sural, and Alexandra Oh ass an off 



http://www.bitcurator.net/docs/bitstreams-to-heritage.pdf 



Building and Sustaining BitCurator 
through Community Engagement 



Search 



EDUCOPIA 

INSTITUTE About Us ▼ Communities ▼ Research ▼ Consulting ▼ Events ▼ Downloads ▼ Contact Us 



Communities 

BitCurator Consortium 
Library Publishing Coalition 
Meta Archive Cooperative 

Educopia and 
Communities 

The Educopia Institute provides 
backbone infrastructure and 
facilitation support to 
strengthen growing community 
networks. Such services are 
provided both on a project basis 
and through formal 
programmatic partnerships. 



BitCurator Consortium 



BitCurator 

Consortium 



Facilitator: 

Katherine Skinner 

Websites: 

BitCurator Projects 
BitCurator Wiki^ 



Helping cultural organizations acquire and curate 
born-digital materials through open-source digital 
forensics tools. 

The BitCurator Consortium is an independent, community-led membership 
association that serves as the host and center of administrative, user and 
community support for the BitCurator environment. It is a continuation of 
BitCurator project (2011-2014) funded by the Mellon Foundation and led by 
the School of Information and Library Science at the University of North 
Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the 
Humanities (MITH). 

The BitCurator environment is a set of open-source tools adapted from the 
digital forensics industry for libraries, archives, and museum use. The tools 
extract digital objects, create metadata, ensure integrity, and identify sensitive 
data, providing libraries, archives, and museums with information to make 
appropriate processing decisions. The BitCurator environment is freely 
distributed under an open-source license. It can be installed as a Linux 
environment; run as a virtual machine on top of most contemporary operating 
systems; or run as individual software tools, packages, support scripts, and 
documentation. 



Recent Hosted Events 

BitCurator Users Forum 201 5. Q9 January 201 5 



http ://www.ed ucopia .org/ com m u n ity/bcc/ 



BitCurator, BitCurator Access and BitCurator Consortium Resources 
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BitCurator Access 



MOrilM Community 



Main Page 
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Development Areas 

BlCLraMr ACC868 •> 



• OS ana uec 

Web Access to Disk Images 



Support 



Software 
Documentation 
Google Group 

http://access.bitcurator.net/ 



BitCurator 



The BitCurator team will be participating n a number or outreach actwales 
tnts week including talks by Cal Lee at the Personalized Access to Cufurai 
Hentage (PATCH) conference Alex Chassanoff at the international Digital 
Curation Conference (IDCC 2014) and Porter Olsen at MITH s Digital 
Dialogues lecture feADMuRE 

BitCurator 0.7.4 Now Available 

/*«»<< fttnv, HMfibr Pottet Ohm 
The latest release or the BaCurator er 
our wfc i http MM Mcurator net/) We re cc 
options on our existing host and via iBibto Direct in 
can be found on the «*> or you c an follow the inks 
http A'drstro ibibto org««curalor/B«CuralorO 7 4 targz (ate V) 



The BilCuralor protect has funds for a muled number of on-sile BUCurator 
training ana workshop visas We are requesting proposals from c electing 
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Video Spotlight 



BitCuratOr rssarsrsr — 

Screencast Tutorial: 
Using Bulk Extractor 

to find potentially sensitive information 
2 lNl • MITH 



Project overview 

Publications 

News 

Consortium Membership 
http ://www. bitcu rator. net/ 



On Twitter: @ bitcu rator 



